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Method and apparatus for data searching and computer-readable medium for supply 
program instructions. 

BACKffl?nrTMn att ^ iNVKNTTflfKr 
This tavern™ relates to a method and apparatus for d*a searching m a computer 

Particularly, butnotexdu^ry, the invention «h*» to a method and ap^ for 
accordance with a user supplied search query. 

T*e invents also relates to a computer readable medium operable fbr supply^ 
method and apparatus. 



20 



25 



30 



it is known to provide a method and apparatus which can receive a user gcppfied 
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The result of any large database search may weU Qomprise many, perhaps a very lzr& 
mmheroi 'hits', this being due to lack of knowledge or memory and/or the lack of a 
P-tu^ar^h capability. Thus, the user^y Worremember oidypartofthe 

<** object of the inv^ 0 n is to make available a ^ ^ ^ 

10 and/or locating blocks of text b, a database of text files. 

Aaofce, object is to p^vide aa apparatus and method for data searching able to better 
discmmnate specific blocks of tort identified by a seamh query. 

15 



method comprising:. 



20 



25 



said fata; 



^hing the data to locate matches berweeo the data and the respective data 
foments; and 



^^gaportionofsaid data fix^ the address ofan^^ 

^^^^eandthead^ofat^ 

the sequence, ^ 
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Advantageously, die method further includes:- 

searching the data to locate the first matt* between the d^ and the fiwt data 
fragment in the sequence; 

. 5 

searching the data to locate the last match between the data and the last data 
fragment in the sequence; and 

identifying apottion of said data between the ad Awaea of ^ 
10 match. 

The method may also include; - 

searching the data to locate the first match between the data and the fi«t data 
* 5 fragment in the sequence^ 

^hing the data to locate matches between the data and the or each subsequent 
data fragment in the sequence; 

20 id^mg a portion^ 

data and the first data fragment to the address of the first match between the data 
and the last data fragment in the sequence snbsequenttoaileastone.^ 
the data and any intermediate data fragment in the sequence. 

25 In each case, the rnethod may include displaying sa.d data upon a display screen ^ 
highlighting said Identified portion of data. 

ACC ° rdb * to * second aspect of the invention, there fs provided, in a computer 

30 ^T^ B ^ fcl ^^*^'^^*^^i«W 

ju comprising;- 
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receiving a search query comprising two or more data fiagments contained m 
sequence in said data item; ' 

searching said data to locate matches with the respective data fragments which 
matches are non-overlapping and b the same sequence as in said search query. 

Ac^g to a t*rd aspect of the invention, there iap^vJded, in a computer environment, 
a method for searching a database to locate a data item, the method oomprising;. 

storing two or more data fragments c<^nedm W ence in said data itei^ 
searching the data base to locate the first match with the first data fragment, and 
flemrchii^ the database to 1^ 

said searching directed in depended upon the locationCs) in the database of 
matches with the or each previous data fragment. 



10 



15 



According to . forth aspect of the invention, there i 8 provided, in a computer environ^ , 
2Q 4, ^ fer8 ^S adj ^tolooateasp^^ 

storing two or more data fi^ents.contained in sequence in said data item, 

sear^saiddaiabasetoloc^th^ 
sequence and storing the start address of arid first match; 



25 



from the end address within the database at which said first match « located, 
searching said database to locate the last match with the last data fragment in said 
sequence and storing the end address of said last match; 
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from the said start address of said first ma tch to the start address of said last match, 
seaichmg said database to We all matdies with the first data fragment ir, said 
sequence; and 

5 for each sub^* data ^ ^ ^ ^ ^ ^ 

address of a. first ^ wM, the previou* fra*^ to thesaid^addr^ofaud 

^IT of said last fi * 8Xnem * ,ocate 311 matohes ^ ^ *** wbsequent 

datatolccateapo^ofsaidd^ide^^ 

input meaas for recdvixjg a sequeace of two or more data frogmen*; 



15 



control mean, ejected to said input m«u« and said data supply meao8 aud 
operable for searching data mad. ^Ubte by the data supply nieam to locate • 

bet** the data and the respective data fn^n^ and for registering 
Ration idexliryiag a portioa of said data from the address of a match between 
^ dam and the first datafo^^ 

a5 
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^anputer code for directing the computer, to search said data to locate matches 
between the data and the respective data, fragments; and 

co^uter code for casing the computer to identify a portion of said data from the 
locate m said data of a first match between the data and the first data fiagment in 
sa.d sequence to the location of a match between the data and the last data fragment 
in the sequence. 

^an^ flirts pur^s*^ 

easumg Particular description give* with inference tothe following drawings. 

MEJQEgCRffXTPN OF the dr Aty rjggg 



10 



^better under^ndin g of the invention, and to show how the same may be carried into 
15 ^ will now be made, by way of example, to the accompanying drawing, in 

Figure 1 isablocfc diagram ofacoini^a 

t 

2o K Swe2iaa flow chart showing a data search process. 

DETAIL 171^^^ 

The ««hod described herd, i, intended to pravi de ^ mowing ^ 
25 ^ ^ ^ a ^ ^ - the form of a sequel of two or more text fragment*, 

ccntabs the text fragments in the same sequence as encoded m the search request ^ 
search . considered auco***] if such a minimal portion of textis found 



30 ^^PO^to^^^^^,^^^ 



text 
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fragments, but it may contain additional instances of one or more text fi^nts from the 
search fequest and they may appear m secpw 

request Two or acre t«a fragments may be identical, but the search algorithm to be 
described will treat each of them as separate entities 

5 

If the search request contains only one text fragment, the minimal portion oftext is simply 
the first occurrence of the text fragment in the given tt*t. Specific espies of situations 
where the described Unction may be usefijl are as follows. 

10 Example I 



Let 



15 



20 



"Once upon a time-.,, palace... queen lived... roses in the garden" 

be a search request Hero the™ *e four ^^en*-^^***^,^**- 
^W-. Win the garden". Note that** text fragment* are separated by ellipses 
(the separator symbol ^ here). I^ing^ traflingbi^ fa a text fragment, if p^ 
are assumed to be part ofthe text fragment. 

Now, if we are given a piece of text, say, 

oMman began his story thus. Once upon a time there was a glorious king who built 
ai^soh^tbatitwasstapiy^ lUekbghada 
queen and tte queen 1W in the palace. In^e the palace there was a rose garden 
Everyclay,ber daughter, the prmc^s^wo^ ft 

.m^imBMpimm^Umm^br^t,^ Oncday.palace 
S^^ato pick some ro^*^^ 
queen lived. When he saw the queen. . 



25 
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«d the search^** above, thesis toflnd th e r^^ poitioa aft£xtf fromthe 
beginning of the text, which satisfies the search requesL 

5 itr^™™"^^ 



it was 



10 pirace.,, would go to swrte lassi^i^^- 

15 Example 2 

S <w, 1 23 Great Wood Street, Smphrev Tow„. Temi raA 
1 Jack SBrody, 431 Pine Avenue, Itose To™, Nely Ca^ E^,. 



USA 



wmujasowas. i Jus technique can be used on any 
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^aWfordatammin^ The intelligence will 1* in wbat inf^^ ^ appIiQation 
program 1 S asked to coHa* from the database, how the collated infotmatkm i s formatted, 
howthepointmtothe database resords ^ maintained to the ooltated 

oocumag across two or mor« reconis. 

"T*«brt.. P-teks...^... Smith... Introduotba... 
Conclusions. . . DNA sequence" 

IS 

°™«ioo c«l be dm) „ ^ situai0I|s> ^ 



30 
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10 



diabase by an application program can be used to search for people regarding 
. whom only fragmentary information is available. Here the structure of the database 
*uo«nate^b«tthMe»^ 
Example 2 above.) 

Web seardt MpxsmeaiiingfiUsea^ When keyword 

searches on the Web produce a very long list of documents, saarch algorithm such 
» thia can automate the further search of the listed A^cume^forth^Televance, 
speaally, when used by domain experts searching documents In their domain of 
e*$erttse. (See Example 3 above). 



4 Sear^ for code segr^^^ 

Figure , ^ ^ en.bodi^ent of a computiBg earf*™* in which the pn**« 
15 invention may be inrolemented 



operating system with a graphical user interface. 
25 ' . ' 

^^ Mtotte<liK « v . SsoltetIie j« 

°^>«^aCD.ROMdrtv. 8fere «^ aa> . ROM9 
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The program instructions are stored on the CD-ROM 9 from which they are read by the 
drives. However, as will b e well understood by those skilled in the art, the instructions as 
read, by the drive 8 may not be usable directly from the CD-ROM 9 but rather may be 
loaded into the memory 6 and sti^ « harf 
5 from there. Also, the instructions may need to be decompressed from the CD-ROM using 
appropriate decompression soft™ 
case, be received and stored by the cornrAto 1 m a se<^^ 
are stored on the CD-ROM. 



10 



15 



20 



25 



In addition to the CD-ROM drive 8, or instead of h, any other suitable input means could 
he provided, fiir example a fIoppy-<diso drive or a tape drive or a wireless communication 
device, such as an infm-red receiver (none of these devices being shown* 

Knaily, the computer 1 also comprises a telephone modem 10 through which the computer 
is able temporarily to link up to the Internet via telephone line 1 1 , a modem 12 located at 
the premise! of an Internet service provider (ISP),, and the ISP's computer 13. 

The cotnputol docs not l»v^^^ Instead, it could form 

P*ft of a network (not shown) along with other computers to which it is connected on a 
penaanentbasis. * could also t^r^rn^anett^^ 

caHed intranet, i.e., a group of data holding ^ similar to internet shea or URL's and 
arranged m*e same wa^ 

^toy^ of Ocular company. In^ead of modem 10, the computer ! could have a 
digital hard-wired link to the ISP's confer 13 or the computer 1 could itself comprise a 
P-nanentiy connected fotemet ^ O^L) whemer or actmg as an !SP for other remote 
users. *<*^ words, instead of tte^ 

keyho«d 3. h may be available to remote users work% through temporary or permanent 
links to computer I acting as ISP or simply as an Internet site. 



30 



l*da»tobe searched could be data winch has been ent^ 



the 
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keyboard 3, perils over a long period, and stored on the hard disc drive S or on another 
CD-ROM entered in the drive 8, assuming the drive and the other CD-ROM are capable of 
rewriting data to the CD-ROM, or on H» aforementioned optional floppy disc-disc or tape 
drive, ^datatobesearchedcouldalsobedatowhica 
5 whh the program ioaructJooa, or it could be data which is available from say a file server 
(** shown) forming part of the aforementioned network, or from data holding sites within 
die Internet or the aforementioned intranet. 



10 



15 



The search method will be describe below with reference to drawing figure 2. First 
however it will be appreciated that the given text and/or the text iregments in the search 
renuert can be fonr^ed to a standard form be^ 

»**«»^*^khw*t^ + i a i mnZ „ thi, staodard ^example, 
multiple oon^ecutive blanks can be. replaced by a single blank; a bh«* before certain 
punctuation mark, (stop, comma, aemfcofan, colon, hyphen, exclamation mark, querfoa 
^eteXiffbnn^xBo^ab^ 

"-dard fo^atting helps, for e^pJe, if tb* text being.searched has not 
oaett proft$ftioiiaIIy edited 

m m b. «a» . ^ „ „ _ ^ — fc ^ ^ 

embedded 



20 
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* i 

t 

^^^^^^loastt^t^t^^in^^^ The flow chart of 
figure 2 is moro general as npb the implemeatation language although it is stUl at Iwst 
partly reflective of a C programming ^vironment. la tte flowchart, ttorefe^^bers 
correspond to the paragraph pumbcm below. The 1 search method comprint** following 



5 



10 



15 



20 



25 



30 



Create a string array variable and call it fragfl end fill this acray with the text 
Segment* in the same sequence as they appear in the search request Let there be a 
^chstri^ stored bft^fO] to fragfn-l]. For Example I above, it will produce; 



f*ag(0] = "Once upon a time" 
fog[l]= "palace" 
ftag[2] = "queen lived* 
*8g[3] = "roses in the garden" 



For each variable fragD] create a corresponding n* variable, ptrs to fiagfi] to store 

ff^^tenmnatetfcep^withanermrmwsageC^^ 
text search algorithm"). 

««™^ «to prooe» ^ tie ^ „^ ^ j,,^ 
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4 Put i = 0. Define Sptr ~ Bstart. 

strien^i]), and store them in ascending order in ptrs.to jB*g{i]. If no pointer is 
found, lien terminate the procedure since the search revest cannot be fulfilled. 
Otherwise, put 

Sptr - first pointer stored in Dtw_toJhg[i] + stdeh(frag[ij). 

This enaures ttofhgfi+ljjf Wi, willbe prided by « least one ins*** of 
fragD] m B without any overlap between the two fragQs. 

increase* i by I. < n ^ to step 5, else go to step 7. 
owafc If we define fytr as given by 



10 

6 

15 7 



fcfr - first pointer stored in ptrejpjfagfn.^ 

20 then 

£*r + strlen (ftagfja-l]) -1. 
Except for far, delete all other pointers saved in the list ptrsjtojtagfn-i]. 
8 *uti-n-2. (R^^m^^rhhrnis^^^-^^^ 



25 



9 
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already enaared that there be at least one instance offragtf) preceding the 
instances of fragp+i] whose pointers are saved in ptr 8 _to_frag[i+-i] without any 
overlap between fiagti] and fragp+i]. 

5 10 Decnsmentibyl. If i>ogp to step 9, else go to step 11. 
U Put bstart - last pointer stored in ptrsJo^fragfO]. 

Note that steps 2 and 3 define B, which may be highlighted by the code executing this 
10 algoi^wWlestep87tolldefined.Thi 8 n»ayaIao 

A worked example 

Consider Example J cited above. 
IS • ' 

The given text is* 

"Many stories begin with 'Once upon a time' such a. the one that now follows; The old 
man began his etoxythu,. Oneeupenatimetherewasaglorio^^ 
20 large^hw^^plythelar^onehadeverbuilt. The king had a queen and the queen 
itvedinthepalace. Iaside the palace there was a rose garden, Every day. her daughter the 
pnnOess, would go to see the roses m the garden. It gave her a lot of pleasure to fa e ' 

^^y their ftagrance. ^^^S^erea^topioksomero^inthe 
«^a^p«afitapd»w»i Wilta ^ liaw4 Wheoheaawthequeen...- 

and the search request is 

"Onceuponatime... palace ... queen lived ,.. roses in the garden". 
30 Stepl.n-4. The string array fiagQ is 



Received from < 512 458 8536 > at 7/29/03 12:20:52 PM [Eastern Daylight Time] 



07/29/2083 10:17 512-458-8536 ANTHONY ENGLAND PAGE 17 

uf '**'*^ -to.*© Hua-m-ajfo AUV^uhN ip PACE 21/37 



9327 



frag[0] ■= "Once upon a tune" 
frag[l]= "palace" 
fragC2] = "queen lived" 
frag[3] = "roses in the garden" 

5 

^ * *• startin « adOwss of the given text be, say, 1000. A search for ftag[0] - 
"Once upon a tune" will return the pointer 1 025. ThusAsftvf - 1025. Since a pointer has 
been found, we go to step 3. 

10 Step* Ai^fcitai t « WMmrf ^^ I]- ^^ tatep ^ ta ^^ 

textwmretumtl»pointeri^>.= i522. 1%^^ + strlen(fl*g[n-l])-i * 1522 
+ 19 - 1 - 154Q. Now go to step 4. 



' 4. Put i = 0. define Sptr* Bstart = 1025, 
15 ' 

Steps 5 and 6. These steps produce the result 

ptrs_to_fi3ag [0] has the entries 1025, 1 1 n. 
ptraJo_ftag p] has the entries 1 166, 1281, 1300, 1488 
20 ptrs_to_frag [23 has the entries 1262 

ptrs_to_frag p] has the entries 1390, 1522. 

Sir^aUtbnli^ are populated wrthrxMnters, we go to step 7. 

25 5^,7. Tnemir^rx^onof^isb^^ FrQm 
ptra_tojJag[3] we find 



£>fr=l390 
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bend- 1390 + 19-1 - 1408 
Keep tptr and delete all other pointers from ptrsjo_frag[3] so that 
5 ptrajofiagP] now has the single entry 1 390. 

" Ste P Put i - 4-2 - 2. Since n> 2, go to step 9. 
Steps 9 and JO. These steps produce the result 

10 

ptrsj»j&ag [2] has the entries 1252. 
ptrsJo_frag [l] has the entries 11$©\ 
ptrs_te_frag [0] has the entries 1025, 1111. 

StepJi. toff/=uu, lira ft starts at 11 Hand ends at 1408. 
Pseudocode fragment for the. search algoritiun 
20 //Given: tori, n, and frag [0] to frag [n-l]. 

//Step 2 
Bend «= NULL; 
Bstart = strsottea, frag foj); 
25 if(!Bstart)tennuiate execution. 

//Step 3 

Lptr « atrstr (Bstart + strteo(frag[0]), fiagfn-lj); 
if (ILptr) terminate execution. 

30 
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while (TRUE) { 

Bend - Lptr + stiien (frag[o-l]).l; 
Lptr = strstr (Bend + 1, fragln-lD; 
if (!Lptr) break; 

//Step4 
i = 0; 

Sptr-Bstart; 



10 



is 



20 



25 



30 } 



// Steps 5 and 6 
for(l-0;i<n;tf+) { 

Save all pointers beginning at or lying between Sptr Bnd Lptr 

- strien (fiagfj]). In the list ptrsjsfragji]. 

If the list ptrsjofragsfi] is empty, taroinate execution; 
else redefine Sptr first pointer stored in ptnMo_fiag[i] + 
strten(fraa[l]). 

} . . 

//Step 7 

lptr = first pointer stored in ptrs_to_frag[n- 1 ] 
bend = lptr + sfrlen (frag[n-l]) - 1; 

delete all pointers stored in the list ptrajojragfo-lj ©wept lptr. 

// Steps 8to 10 
fbr<i-n-2;i>-Q;K) { 

Delete from ptrcjofragfi] all such pointers to which, if strien (fiagOD-1 
is added, will point to an address^ The last pointer in ptrcjojErag (t+lj. 
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//Step 11 

bstart -last pointer saved in ptrs_to_frag[0]. 

Whilst a ^cular preferred embodiment of the invention has been shown and described 
5 herein it will be understood that persons of sldil b the art may modify the embed™ and 
that such modification and development* are within the purview of the wvertion as 
described or claimed. 



Received from < 512 458 8536 > at 7/29/03 12:20:52 PM [Eastern Daylight Time] 



